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Megasphaera massiliensis strain NP3 T sp. nov. is the type strain of Megasphaera massiliensis 
sp. nov., a new species within the genus Megasphaera. This strain, whose genome is de- 
scribed here, was isolated from the fecal flora of an HIV-infected patient. M. massiliensis is a 
Gram-negative, obligate anaerobic coccobacillus. Here we describe the features of this or- 
ganism, together with the complete genome sequence and annotation. The 2,661,757 bp 
long genome (1 chromosome but no plasmid) contains 2,577 protein-coding and 61 RNA 
genes, including 5 rRNA genes. 



Introduction 

Megasphaera massiliensis sp. nov. strain NP3 T (= 
CSUR P245 = DSM 26228] is the type strain of M. 
massiliensis sp. nov. This bacterium is a Gram- 
negative, non-sporulating, anaerobic and non- 
motile coccobacillus that was isolated from the 
stool of an HIV-infected patient as part of a 
culturomics study designed to cultivate individu- 
ally all bacterial species within human feces [1,2]. 

The current classification of prokaryotes is based 
on a combination of phenotypic and genotypic 
characteristics [3,4] including 16S rRNA gene phy- 
logeny, G + C content and DNA-DNA hybridization 
(DDH]. Despite being considered as a "gold stand- 
ard", these tools exhibit several drawbacks [5,6]. 
To date, almost 4,000 bacterial genomes have 
been sequenced [7] and the cost of genomic se- 
quencing is constantly decreasing. Therefore, we 
recently proposed the addition of genomic infor- 
mation to phenotypic criteria, including the pro- 
tein profile, for the description of new bacterial 
species [8-29]. 

The genus Megasphaera (Rogosa 1971], created in 
1971 [30], currently contains 5 species including 
M. cerevisiae (Engelmann and Weiss 1986] [31], 
M. elsdenii (Gutierrez et al. 1959] [30], M. 
micronuciformis (Marchandin et al. 2003] [32], M. 
paucivorans Quvonen and Suihko 2006] [33] and 
M. sueciensis (Juvonen and Suihko 2006] [33]. The 
type species, M. elsdenii (Gutierrez et al. 1959] 



[30], originally classified in the Peptostrepto coccus 
genus (Gutierrez et al. 1959], was later reclassi- 
fied within a new genus, Megasphaera (Rogosa 
1971], in the family Veillonellaceae (Rogosa 1971] 
[30]. It is an obligate ly anaerobic, lactate- 
fermenting, gastrointestinal microbe of ruminant 
and non-ruminant mammals, including humans. It 
was also isolated in a case of human endocarditis 
[34]. The genome from M. elsdenii strain DSM 
20460, isolated from the rumen of sheep, was re- 
cently sequenced [35]. M. cerevisiae [31], M. 
micronuciformis [32], M. paucivorans and M. 
sueciensis [33] are brewery-associated species. 
Here we present a summary classification and a 
set of features for M. massiliensis sp. nov. strain 
NP3 T (= CSUR P245 = DSM 26228] together with 
the description of the complete genome sequenc- 
ing and annotation. These characteristics support 
the circumscription of the species M. massiliensis. 

Classification and features 

A stool sample was collected from a 32-year-old 
HIV-infected patient living in Marseille, France. 
The patient gave written informed consent for the 
study. The study was approved by the Ethics 
Committee of the Institut Federatif de Recherche 
IFR48, Faculty of Medicine, Marseille, France, un- 
der agreement number 09-022. 
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The fecal specimen was preserved at -80°C after 
collection. Strain NP3 T (Table 1) was isolated in 
January 2012 by cultivation on 5% sheep blood 
agar in anaerobic condition at 37°C, following a 7- 
day preincubation of the stool specimen in an an- 
aerobic blood culture bottle enriched with sterile 
5% sheep rumen fluid and 5% sheep blood. The 
strain exhibited a nucleotide sequence similarity 
with other members of the genus Megasphaera 



ranging from 91.5% with M. cerevisiae strain 
PAT1 T to 95.8% with M. elsdenii strain ATCC 
25940 T , its closest validated phylogenetic neigh- 
bor (Figure 1]. These values were lower than the 
98.7% 16S rRNA gene sequence threshold rec- 
ommended by Stackebrandt and Ebers to deline- 
ate a new species without carrying out DNA-DNA 
hybridization [4]. 



Table 1 . Classification and general features of Megasphaera massiliensis strain NP3 T according to the 
MIGS recommendations [36] 



MIGS ID 


Property 


Term 


Evidence code 3 






Domain Bacteria 


37] 






Phylum Firmicutes 


TAS [38-401 






Class Negativicutes 


TAS [41 ] 




v.. Ul 1 CI 1 L ^IClJJl 1 l^dLIWl 1 


Ordpr ^elennmnnarlales 


TAS [41 ] 






F^milv \/eillnnellareae 

1 CI 1 1 1 1 1 y V 1 1 1 i^J 1 1 > 1 1 < 1 > < I \ 


TAS [41 -431 






fiPniis hAeoa^nhaeva 


TAS [30 32 43 441 






Snprips Mepasnhaera massiliensis 


IDA 






Tvnp slrain NIP3 T 


IDA 




Gram stain 


Npgative 


IDA 




CpII shanp 


coccobacilli 


IDA 




Motil itv 

J V 1 v_/ L I 1 1 L y 


non motilp 


IDA 




Speculation 


nonsporulating 


IDA 




Tpmnprati irp mnpp 

1 LI 1 IUI.I a LLI 1 L 1 CI 1 ItL 


mpsonhi Mr 

1 1 ILJUUI 1 1 1 I l 


IDA 




Ontimum tpmnpraturp 

V.y IJ LI 1 1 1 LI 1 1 1 LL 1 1 IIJL 1 CI LLJ 1 C 


37°C 


IDA 


MIGS-6 3 


1 i n itv 

- — ' d t j j i 1 1 l y 


i inknown 

LI 1 1 IX 1 1 V_/ VV 1 1 




/ VI 1 VJ J-jLZ. 


v_7Ay KCI 1 1 CUU 1 1 CI I ICI 1 L 


ell IdCI UUIL 


IDA 




Carbon source 


unknown 






Energy source 


unknown 




MIGS-6 


Habitat 


human gut 


IDA 


MIGS-15 


Biotic relationship 


free living 


IDA 




Pathogenicity 


unknown 






Biosafety level 


2 




MIGS-14 


Isolation 


human feces 




MIGS-4 


Geographic location 


France 


IDA 


MIGS-5 


Sample collection time 


January 2012 


IDA 


MIGS-4. 1 


Latitude 


43.296482 


IDA 


MIGS-4.1 


Longitude 


5.36978 


IDA 


MIGS-4.3 


Depth 


Surface 


IDA 


MIGS-4.4 


Altitude 


0 m above sea level 


IDA 



Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct 
report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for 
the living, isolated sample, but based on a generally accepted property for the species, or anecdo- 
tal evidence). These evidence codes are from the Gene Ontology project [45]. If the evidence is 
IDA, then the property was directly observed for a live isolate by one of the authors or an expert 
mentioned in the acknowledgements. 
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100 



82 



33 
39 



98 



100 



Megasphaera paucivorans (DQ223730) 
Megasphaera sueciensis ( DQ223729) 
— Megasphaera cerevisiae (L37040) 

Megasphaera elsdenii (U95027) 

Megasphaera massiliensis (JX424772) 

Anaeroglobus geminatus (AF338413) 



Megasphaera micronuciformis (AF473834) 



100 



100 



100 



Veillonella criceti ( AF186072) 

Veillonella magna (EU096495) 
Veillonella caviae (AY355140) 



I — Veillonella denticariosa (EF185167) 
Dialister pneumosintes (X82500) 



Figure 1. Phylogenetic tree highlighting the position of Megasphaera massiliensis strain NP3 T relative to other type 
strains within the genus Megasphaera and other members of the family Veillonellaceae. GenBank accession num- 
bers are indicated in parentheses. Sequences were aligned using CLUSTALW, and phylogenetic inferences ob- 
tained using the maximum-likelihood method within the MEGA software. Numbers at the nodes are percentages of 
bootstrap values obtained by repeating the analysis 500 times to generate a majority consensus tree. Dialister 
pneumosintes was used as outgroup. The scale bar indicates a 1% nucleotide sequence divergence. 
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Different growth temperatures (25, 30, 37, 45°C] 
were tested. Growth occurred between 30 and 
45°C, and optimal growth was observed at 37°C. 
Colonies were transparent and smooth with a di- 
ameter of 0.5 to 1 mm on blood-enriched Columbia 
agar (BioMerieux]. Growth of the strain was tested 
in 5% sheep blood agar (BioMerieux] under anaer- 
obic and microaerophilic conditions using GENbag 
anaer and GENbag microaer systems, respectively 
(BioMerieux], and in the presence of air, with or 
without 5% CO2. Growth only occurred in anaero- 
bic atmosphere. No growth was observed under 
aerobic conditions and microaerophilic conditions. 
A motility test was negative. Cells grown on agar 
are Gram-negative coccobacilli (Figure 2], with a 
mean diameter of 0.87 u,m and the presence of 
phages in electron microscopy (Figure 3]. 

Strain NP3 T exhibited oxidase, but no catalase ac- 
tivity. Using RAPID 32A identification strips 
(BioMerieux], positive reactions were observed for 
a-glucosidase and (B-glucosidase. Negative reac- 
tions were observed for urease, arginine 
dihydrolase, a and (B-galactosidase, (3- 
galactosidase-6-phosphate, a-arabinosidase, (B- 
glucuronidase, N-acetyl-fB-glucosanimidase, man- 
nose and raffmose fermentation, a-fucosidase, 
alkanine phosphatase, arginine arylamidase, 



proline arylamidase, leucyl glycine arylamidase, 
phenylalanine arylamidase, leucine arylamidase, 
pyroglutamic acid arylamidase, tyrosine 
arylamidase, alanine arylamidase, glycine 
arylamidase, histidine arylamidase, glutamyl glu- 
tamic acid arylamidase and serine arylamidase. 
Carbohydrate metabolism was examined using an 
API 50CH strip (BioMerieux]. Positive reactions 
were observed for potassium gluconate, potassium 
5-cetogluconate, aesculin, salicine, N- 
acetylglucosamine, and arbutine production, and L- 
arabinose, D-ribose, D-xylose, D-galactose, D- 
glucose, D-fructose, D-mannose, L-rhamnose, D- 
mannitol, D-sorbitol, D-celiobiose, D-maltose, D- 
lactose, D-trehalose, gentiobiose, L-fucose and D- 
arabitol fermentation. Weak reactions were ob- 
served for amygdaline and potassium 2- 
cetogluconate production, and glycerol and D- 
arabinose fermentation. Table 2 summarizes the 
differential phenotypic characteristics of M. 
massiliensis, M. elsdenii and M. micronuciformis. M. 
massiliensis strain NP3 T was susceptible to amoxi- 
cillin, amoxicillin-clavulanic acid, ceftriaxone, 
imipenem and doxycycline but resistant to 
vancomycin, erythromycin, rifampicin, trime- 
thoprim-sulfamethoxazole, metronidazole and 
ciprofloxacin. 




Figure 3. Transmission electron microscopy of M. massiliensis strain NP3 T , us- 
ing a Morgani 268D (Philips) at an operating voltage of 60kV. The scale bar 
represents 200 nm. 
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Table 2. Differential characteristics of M. massiliensis strain NP3 T , M. elsdenii strain DSM 
20460 and M. micronuciformis strain AIP 412-00V 



Properties 


M. massiliensis 


M. elsdenii 


M. micromuciniformis 


Cell diameter (urn) 


0.87 


1.5-3.0 


0.4-0.6 


Oxygen requirement 


anaerobic 


anaerobic 


Anaerobic 


Pigment production 


+ 


+ 


- 


Gram stain 


- 


- 


- 


Motility 


- 


- 


- 


Endospore formation 


- 


- 


- 


Indole production 




na 


- 


Production of 








Catalase 






- 


Oxidase 


■ 


■ 


na 


Nitrate reductase 


na 


- 


- 


Urease 


- 


- 


na 


B-galactosidase 


- 


- 


na 


N-acetyl-glucosamine 


na 


- 


na 


Acid production from 








Arabinose 


w 


- 


- 


Ribose 


+ 


— 


na 


Mannose 


- 




- 


Mannitol 


+ 


■ 


- 


Raff i nose 


- 


- 


- 


Sucrose 


- 


- 


- 


Glycerol 


w 


- 


- 


Sorbitol 


+ 


- 


na 


Arabitol 


+ 


- 


na 


Galactose 


+ 


+ 


- 


D-glucose 


+ 


+ 


- 


D-fructose 


+ 


+ 


- 


D-maltose 


+ 


+ 




D-lactose 


+ 


+ 




Hydrolysis of gelatin 


+ 


+ 




Habitat 


Human gut 


Sheep rumen 


Liver abscess, whitlow 



na = data not available; w = weak 
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Matrix-assisted laser-desorption/ionization time- 
of-flight (MALDI-TOF] MS protein analysis was car- 
ried out as previously described [46] using a 
Microflex spectrometer (Briiker Daltonics, Germa- 
ny]. Briefly, a pipette tip was used to pick one iso- 
lated bacterial colony from a culture agar plate and 
spread it as a thin film on a MTP 384 MALDI-TOF 
target plate (Bruker Daltonics]. Twelve distinct de- 
posits were done for strain NP3 T from 12 isolated 
colonies. Each smear was overlaid with 2 \iL of ma- 
trix solution (a saturated solution of alpha-cyano-4- 
hydroxycinnamic acid] in 50% acetonitrile, 2.5% 
tri-fluoracetic acid and allowed to dry for five 
minutes. Spectra were recorded in the positive lin- 
ear mode for the mass range from 2,000 to 20,000 
Da (parameter settings: ion source 1 (ISI], 20kV; 
IS2, 18.5 kV; lens, 7 kV]. A spectrum was obtained 
after 675 shots with variable laser power. The time 
of acquisition was between 30 seconds and 1 mi- 
nute per spot. The 12 NP3 T spectra were imported 
into the MALDI Bio Typer software (version 2.0, 
Bruker] and analyzed by standard pattern match- 
ing (with default parameter settings] against the 



main spectra of 3,769 bacteria, including the spec- 
tra from M. micronuciformis, Veillonellct atypicct, V. 
caviae, V. criceti, V. denticariosi, V. dispar, V. 
montpellierensis, V. parvula, V. ratti and V. rogosae, 
that were used as reference data (Figures 4 and 5]. 
The method of identification included the m/z from 
3,000 to 15,000 Da. For every spectrum, 100 peaks 
at most were taken into account and compared 
with the spectra in the database. The MALDI-TOF 
score enabled the predictive identification and dis- 
crimination of the tested species from those in a 
database: a score > 2 with a validated species ena- 
bled identification at thOe species level, and a score 
< 1.7 did not enable any identification. No signifi- 
cant score was obtained for strain NP3 T against the 
Bruker database, suggesting that our isolate was 
not a member of a known species. We added the 
spectrum from strain NP3 T to our database for fu- 
ture reference (Figure 4]. Figure 5 shows the 
MALDI-TOF MS spectrum differences between M. 
massiliensis and other Megasphaera and Veillonella 
species (Figure 5]. 



7x1 0 4 
1 2.0- 



1.5- 



1.0- 



0.5- 




2000 4000 6000 8000 10000 12000 14000 16000 18000 

m/z 



Figure 4. Reference mass spectrum from M. massiliensis strain NP3'. Spectra from 12 individual 
colonies were compared and a reference spectrum was generated. 
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Megasphaera micro nucif ormis P5bue_3_5_2AN USH 
O-R/legasphaera massiliensis DSM 26228T 



Vei 


lone 


la rogosae DSM 18960T 


Vei 


lone 


laratti DSM20736T 


-CI 7 

Vei 


lone 


la parvula DSM 2008T 




lone 


la paivula DSM 2007 


Vei 


lone 


la montpellierensis DSM 17217T 


-0 5 






Vei 


lone 


ladisparDSM20735T 


■°-Veillone 


la dentlcariosi DSM 19010 


Vei 
■0.3 


lone 


la denticanosi DSM 19009T 


Vei 


lone 


la criceti DSM20734T 


■0.2 
Vei 


lone 


lacaviae DSM20738T 


-oVei 


lone 


la atypica DSM20739T 


Veillone 


la atypica DSM 1399 



Figure 5. Gel view comparing the M. massiliensis NP3 T spectrum with those of M. micronuciformis and Veillonella 
species. The Gel View displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel like look. The x- 
axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra 
loading. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the 
relation between the color a peak is displayed and the peak intensity in arbitrary units. 



Genome sequencing information 

Genome project history 

The organism was selected for sequencing on the 
basis of its phenotypic differences, phylogenetic 
position and 16S rRNA similarity to other mem- 
bers of the genus Megasphaera, and is part of a 
study of the human digestive flora aiming at iso- 
lating all bacterial species within human feces 
[1,2]. It was the third genome of a Megasphaera 
species and the first sequenced genome of M. 
massiliensis sp. nov. The GenBank ID is 
CAVO00000000 and consists of 106 large contigs. 
Table 3 shows the project information and its as- 
sociation with MIGS version 2.0 compliance [47]. 

Growth conditions and DNA isolation 

Megasphaera massiliensis strain NP3 T sp. nov. (= 
CSUR P245 = DSM 26228) was grown anaerobical- 
ly on 5% sheep blood-enriched agar (BioMerieux) 
at 37°C. Ten petri dishes were spread and resus- 
pended in 3 x 100m. of G2 buffer (EZ1 DNA Tissue 
kit, Qiagen]. A first mechanical lysis was per- 
formed using glass powder on the Fastprep-24 



device (MP Biomedicals, Ilkirch, France] during 2 
x 20 seconds. DNA was then treated with 2.5 
Hg/[iL lysozyme treatment (30 minutes at 37°C] 
and extracted using a BioRobot EZ 1 Advanced XL 
(Qiagen]. The DNA was then concentrated and pu- 
rified using a Qiamp kit (Qiagen). The yield and 
the concentration were measured using the 
Quant-it Picogreen kit (Invitrogen) on the 
Genios_Tecan fluorometer at 82.2 ng/|il. 

Genome sequencing and assembly 

A paired-end sequencing strategy was used 
(Roche). The library was pyrosequenced on a GS 
FLX Titanium sequencere (Roche). This project was 
loaded on a 1/4 region on PTP Picotiterplate 
(Roche). Five |ig of DNA were mechanically frag- 
mented on the Covaris device (KBioScience-LGC 
Genomics, Teddington, UK) using miniTUBE-Red 
5 Kb. The DNA fragmentation was visualized 
through the Agilent 2100 BioAnalyzer on a DNA 
labchip 7500 with an optimal size of 3.3 kb. 
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After PCR amplification through 17 cycles followed 
by double size selection, the single stranded paired- 
end library was then loaded on a DNA labchip RNA 
pico 6000 on the BioAnalyzer. The pattern showed 
an optimum at 613 bp and the concentration was 
quantified on a Genios Tecan fluorometer at 3.48 
pg/pl. The library concentration equivalence was 
calculated at 5.21E+09 molecules/u,L. The library 
was stored at -20°C until further use, and the li- 
brary was clonally amplified with 0.5 cpb in 3 
emPCR reactions with the GS Titanium SV emPCR 
Kit (Lib-L] v2 (Roche]. The yield of the emPCR was 
9.99%, in the range of 5 to 20% from the Roche 
procedure. Approximately 790,000 beads were 
loaded on the GS Titanium PicoTiterPlates PTP Kit 
70x75 and sequenced with the GS FLX Titanium 
Sequencing Kit XLR70 (Roche]. The run was per- 
formed overnight and then analyzed on the cluster 
through the gsRunBrowser and Newbler Assembler 
(Roche]. A total of 186,153 passed filter wells gen- 
erated 61.97 Mb with a length average of 332 bp. 
The filter-passed sequences were assembled using 
Newbler with 90% identity and 40 bp as overlap. 
The final assembly identified 114 large contigs 
(>1,500 bp] arranged into 28 scaffolds and gener- 
ated a genome size of 2.66 Mb, which corresponds 
to a coverage of 23. 3x genome equivalent. 

Genome annotation 

Prodigal [48] with default parameters was used to 
predict the Open Reading Frames (ORFs]. The pre- 
dicted ORFs were excluded if they spanned a se- 
quencing gap region. Protein functional assessment 
was obtained by comparison with sequences in the 
GenBank [49] and Clusters of Orthologs Groups 
(COG] databases using BLASTP. The rRNA and 
tRNA were identified using RNAmmer [50] and 
tRNAscan-SE 1.21 [51] respectively. SignalP [52] 



and TMHMM [53] were used to predict signal pep- 
tides and transmembrane helices, respectively. 
ORFans were identified if their BLASTP E-value 
was lower than le-03 for alignment length greater 
than 80 amino acids. If alignment lengths were 
smaller than 80 amino acids, we used an E-value of 
le-05. Such parameter thresholds have already 
been used in previous works to define ORFans. Ar- 
temis [54] was used for data management and DNA 
Plotter [55] was used for visualization of genomic 
features. PHAST was used to identify, annotate and 
graphically display prophage sequences within bac- 
terial genomes or plasmids [56]. To estimate the 
mean level of nucleotide sequence similarity at the 
genome level between M. massiliensis and another 
5 members of the family Veillonellaceae, ortholo- 
gous proteins were detected using the Proteinortho 
software with the following parameters: e-value 
le-5, 30% percentage of identity, 50% coverage 
and algebraic connectivity of 50% [57], and ge- 
nomes compared two by two. For each pair of ge- 
nomes, we determined the mean percentage of nu- 
cleotide sequence identity among orthologous 
ORFs using BLASTn. 

Genome properties 

The genome of M. massiliensis strain NP3 T is 
2,661,757 bp long (in 28 scaffolds, 1 chromosome, 
and no plasmid] with a 50.2% GC content (Table 3 
and Figure 6]. Of the 2,577 predicted genes, 2,516 
were protein-coding genes and there were 61 RNA 
genes. A total of 1,697 genes (65.8%] were assigned 
a putative function. A total of 248 genes (9.6%] were 
annotated as hypothetical proteins. The properties 
and the statistics of the genome are summarized in 
Tables 4 and 5. The distribution of genes into COGs 
functional categories is presented in Table 5. 



Table 3. Project information 



MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


High-quality draft 


MIGS-28 


Libraries used 


Paired-end 3kb library 


MIGS-29 


Sequencing platforms 


454 GS FLX Titanium 


MIGS-31. 2 


Fold coverage 


19 x 


MIGS-30 


Assemblers 


Newbler version 2.5.3 


MIGS-32 


Gene calling method 


Prodigal 




INSDC ID 


PRJEB645 




Genbank ID 


CAVO00000000 




Genbank Date of Release 


June 4, 2013 




Project relevance 


Study of the human gut microbiome 
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Figure 6. Graphical circular map of the M. massiliensis strain NP3' chromosome. From the outside in: the outer two 
circles show open reading frames oriented in the forward and reverse (colored by COG categories) directions, re- 
spectively. The third circle displays the rRNA gene operon (red) and tRNA genes (green). The fourth circle shows 
the G+C% content plot. The inner-most circle shows the GC skew, purple and olive indicating negative and posi- 
tive values, respectively. 



Comparison with the genomes from M. elsdenii, 
Megasphaera species, Veillonella dispar, V. parvula 
and Anaeroglobus geminatus 

The draft genome of M. massiliensis strain NP3 T 
(2.66 Mb] has a larger size than that of M. elsdenii 
(2.47 Mb), V. parvula (2.13 Mb), V. dispar (2.12 Mb), 
A geminatus (1.79 Mb) and M. micronuciformis 
(1.77 Mb) respectively. M. massiliensis has a lower 
G + C content (50.2%) than M. elsdenii (52.8%) but 
higher than V. parvula, V. dispar, M. micronuciformis 
and A. geminatus (38.6, 38.8, 46.8 and 48.7%, re- 
spectively). M. massiliensis (2,516) has more pre- 
dicted protein-coding genes than M. elsdenii, A. 
geminatus, V. dispar, V. parvula, and M. 



micronuciformis (2,219, 2,148, 1,954, 1,844 and 
1,774, respectively) (Table 6). In addition, M. 
massiliensis shared a mean genomic sequence simi- 
larity of 81.84, 69.44, 63.68, 62.92 and 70.27% with 
M. elsdenii, M. micronuciformis, V. dispar, V. parvula 
and A geminatus respectively (Table 6). 

M. massiliensis harbors two intact bacteriophages. 
Based on PHAST results, phage 1 of M. massiliensis 
was most closely related to Clostridium phage phi 
CD119 whereas phage 2 was most similar to Bacil- 
lus phage BCJAlc. 
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Table 4. Nucleotide content and gene count levels of the genome 


Attribute 


Value 


% of total 3 


Genome size (bp) 


2,661,757 




DNA coding region (bp) 


1,479,861 


93.98 


DNA G+C content (bp) 


1,337,412 


50.2 


Coding region (bp) 


1,479,861 


93.98 


Number of replicons 


1 




Extrachromosomal elements 


0 




Total genes 


2,577 


100 


RNA genes 


61 


2.39 


rRNA operons 


2 




Protein-coding genes 


2,516 


97.63 


Genes with function prediction 


1,697 


65.8 


Genes assigned to COGs 


1,892 


73.41 


Genes with peptide signals 


60 


2.38 


Genes with transmembrane helices 


530 


21.0 


CRISPR repeats 


7 




a The total is based on either the size of the 


genome in base 


pairs or the 



total number of protein coding genes in the annotated genome. 



Table 5. Number of genes associated with the 25 general COG functional categories 



Code 


Value 


%age a 


Description 


J 


138 


5.48 


Translation 


A 


0 


0 


RNA processing and modification 


K 


120 


4.77 


Transcription 


L 


118 


4.69 


Replication, recombination and repair 


B 


0 


0 


Chromatin structure and dynamics 


D 


23 


0.91 


Cell cycle control, mitosis and meiosis 


Y 


0 


0 


Nuclear structure 


V 


28 


1.11 


Defense mechanisms 


T 


27 


1.07 


Signal transduction mechanisms 


M 


103 


4.09 


Cell wall/membrane biogenesis 


N 


0 


0 


Cell motility 


Z 


0 


0 


Cytoskeleton 


W 


0 


0 


Extracellular structures 


u 


24 


0.95 


Intracellular trafficking and secretion 


o 


52 


2.07 


Post-translational modification, protein turnover, chaperones 


c 


147 


5.84 


Energy production and conversion 


G 


118 


4.69 


Carbohydrate transport and metabolism 


E 


163 


6.48 


Amino acid transport and metabolism 


F 


52 


2.07 


Nucleotide transport and metabolism 


H 


87 


3.46 


Coenzyme transport and metabolism 


I 


46 


1.83 


Lipid transport and metabolism 


P 


79 


3.14 


Inorganic ion transport and metabolism 


Q 


14 


0.56 


Secondary metabolites biosynthesis, transport and catabolism 


R 


217 


8.62 


General function prediction only 


S 


141 


5.60 


Function unknown 




195 


7.75 


Not in COGs 



a The total is based on the total number of protein coding genes in the annotated genome. 
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Table 6. Orthologous gene comparison and average nucleotide identity of M. massiliensis with other compared 
genomes + 

Species 

(GenBank accession number) M. massiliensis M. elsdenii M. micronuciformis V. dispar V. parvula A. geminatus 

M. massiliensis 
(CAVO00000000)) 



M. elsdenii (HE576794) 

M. micronuciformis 
(AECS00000000) 

V. dispar (ACIK00000000) 

V. parvula (ADFU00000000) 

A. geminatus 
(AGCJ00000000) 



2,516 

81.84 

69.44 
63.68 
62.92 

70.27 



1,289 
2,219 

69.01 
63.08 
62.01 

70.50 



1,189 987 
1,175 980 



1,774 

64.92 
64.43 

74.22 



933 
1,954 

67.62 

63.87 



999 
989 

939 
1,081 
1,844 

62.99 



1,159 
1,145 

1,167 
893 
899 

2,148 



+ Upper right, numbers of orthologous genes; lower left, mean nucleotide identities of orthologous genes. Bold numbers 
indicate the numbers of genes or each genome. 



Conclusion 

On the basis of phenotypic, phylogenetic and ge- 
nomic analyses, we formally propose the creation 
of Megasphaera massiliensis sp. nov. that contains 
the strain NP3 T . This bacterial strain has been 
found in Marseille, France. 

Description of Megasphaera massiliensis sp. 
nov. 

Megasphaera massiliensis (mas.il.ien'sis. L. gen. 
fem. n. massiliensis, of Massilia, the Latin name of 
Marseille where was cultivated strain NP3 T ]. It has 
been isolated from the feces of a 32-year-old HIV- 
infected French patient. 

Colonies were smooth and transparent with 0.5 to 
1 mm in diameter on blood-enriched Columbia 
agar. Optimal growth is only achieved anaerobi- 
cally and grows between 30 and 45°C, with opti- 
mal growth observed at 37°C. The strain is a 
Gram-negative, non-endospore forming, non mo- 
tile coccobacillus. Positive for a-glucosidase, (3- 
glucosidase, potassium gluconate, potassium 5- 
cetogluconate, aesculin, salicine, N- 
acetylglucosamine, and arbutine production. Posi- 
tive for L-arabinose, D-ribose, D-xylose, D- 
galactose, D-glucose, D-fructose, D-mannose, L- 
rhamnose, D-mannitol, D-sorbitol, D-celiobiose, D- 
maltose, D-lactose, D-trehalose, gentiobiose, L- 



fucose and D-arabitol fermentation. Negative for 
urease, arginine dihydrolase, a and (B- 
galactosidase, (B-galactosidase-6-phosphate, a- 
arabinosidase, (B-glucuronidase, N-acetyl-(B- 
glucosanimidase, mannose and raffinose fermen- 
tation, a-fucosidase, alkanine phosphatase, argi- 
nine arylamidase, proline arylamidase, leucyl gly- 
cine arylamidase, phenylalanine arylamidase, 
leucine arylamidase, pyroglutamic acid 
arylamidase, tyrosine arylamidase, alanine 
arylamidase, glycine arylamidase, histidine 
arylamidase, glutamyl glutamic acid arylamidase 
and serine arylamidase. Weak reactions observed 
for amygdaline and potassium 2-cetogluconate 
production, and glycerol and D-arabinose fermen- 
tation. Cells are susceptible to amoxicillin, amoxi- 
cillin-clavulanic acid, ceftriaxone, imipenem and 
doxycycline, but resistant to vancomycin, eryth- 
romycin, rifampicin, trime- 
thoprim/sulfamethoxazole, metronidazole, and 
ciprofloxacin. The G+C content of the genome is 
50.2%. The 16S rRNA and genome sequences are 
deposited in Genbank under accession numbers 
JX424772 and CAVO00000000, respectively. The 
type strain NP3 T (= CSUR P245 = DSM 26228] was 
isolated from the fecal flora of an HIV-infected pa- 
tient in Marseille, France. 
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